Objectives

This working paper propose to discuss the theoretical problem of regionalisation of a world (in abstract sense) through the empirical example of The World (where we live) described by trade flows over a long period of time and for different types of products.

We will use for that purpose the CHELEM database produced by the CEPII which offers an exceptional coverage of trade flows over a period of 50 years from 1967 to present (2020). The most detailed version of this database describes the exchange between 94 x 94 territorial units (states or group of states) for 72 types of goods over a period of 54 years which means a 4-dimension object (hypercube) of size \(94 \times 93 \times 72 \times 54 = 33988896\) cells.

For our experiment, we will use a reduction of the database based on 12 territorial units described by 9 groups of goods for 5 periods of 10 years each. The hypercube used in our experiment will be therefore limited to a size of \(12 \times 11 \times 9 \times 5 = 5940\) cells. This can appear rather limited but - as we will demonstrate - the complexity of such an object is yet very high and it appears better to establish the theoretical foundation of the research of such an object before to adress larger databases where computational problem will grow exponentially.

Our overarching question can now be formulated in the following way :

I. EXPERIMENTAL DATABASE

The 94 terrtorial units of CHELEM

The original version of the CHELEM database is made of 94 territorial units. A majority of this territorial units correspond to states but some of them are made of aggregates of states for which it was difficult to separate trade flows or to collect them. The map below indicates what are the territorial units that do not fit with international division of the world in states.

The aggregates of states are generally based on groups of small states (like in central America or Oceania) but it can also be the case for larger goups of states playing an important role in trade like in the case of the aggregate between Irag, Iran and Koweit. The aggregation is also very large in the case of subsaharan Africa where only few states are identified and the other mixed in large area, not necessarily contiguous. At the same time, Europe is fully disaggregated in isolated states, except in the case of Malta and Cyprus, which will have for consequence an increase of trade flows in this part of the world. if USA was divided in 51 federal states and China or India in provinces or states, it would necessarily increase their part of exchanges.

We are therefore facing here a difficult question of Modifiable Area Unit Problem (MAUP) which can not be easily solved without deciding immediately to aggregate the data in larger units, more homogeneous, where internal flows will be systematically removed. This will produce of course a strong reduction of the initial information but make possible to have a better analysis of the relation between the new territorial units.

The 4 x3 = 12 basic territorial units

On the basis of expert advices, we have chosen 12 basic territorial units which are in fact associated to a first division of the world in 4 regions, each of them divided in 3 subregions.

The autors of this partition of the world suggest that the world economy has been (at least during a period of time) or could have been (whishfull thinking ?) be organized around three integrated “vertical macroregions” and one residual part of the word less integrated and submit to variable influence of the three vertical regions :

  • G1 : Europe-Mediterranea-Africa : Clearly inherited from the history, this vertical region is based on various type of proximities including geographical distance, common sea (Mare Nostrum), common language, colonial legacy … But what has been the destiny of these links over the last 50 years following the independancy of states from Africa ?

  • G2 : Americas : Since the 19th century, “the Monroe Doctrine is a United States foreign policy position that opposes European colonialism in the Western Hemisphere. It holds that any intervention in the political affairs of the Americas by foreign powers is a potentially hostile act against the United States” (Wikipedia). This doctrine has been related to lot of conflict between the different parts of Americas but also associated to the building of various forms of cooperation like NAFTA (1994), MERCOSUR (1991), etc… In any case, the geographical proximity was clearly here in favor of a potential integration. But the reduction of transport cost in the 1980’s has modified the role played by these factor in favor of trans-Pacific relationships. So, what is the situation of America’s integration over our 50 years period of interest ?

  • G3 : Asia-Pacifica : The economic integration of this part of the world is a long and complex process initially boosted by Japan and Korea, further by China and associated to a continuous process or development of free trade areas like ASEAN. This potential macro-region has been at the same time the pivot of global economic integration of the world, firstly with trans-pacific relation until 1990 and further with the rest of the world with the growing influence of China after this state joined the WTO in 2001. So, is it still a macroregion or the economic core of contemporary world ?

  • G4 : Rest of the World : We can not speak here from an integrated economic region but rather as a group of states that (1) benefit from ressources of interest forthe rest of the world (e.g. oil and gas from the Gulf, mineral products from Russia, …) and/or (2) develop a strategy of diversification of their exchange at world scale and refuse to be dependent from too powerful partners (e.g. strategy of India, Russia or Saudi Arabia). The question here is to what extent this part of the world remained “neutral” as compare to the three other ones or has been succesfully associated to the different other regions according to variable geometries.

All this remarks are hypothesis that suggest a possible way to cluster the 12 territorial units in 3 or four groups. But our aim is not here to validate the partition \((G_1,G_2,G_3,G_4)\) but rather to use it at starting point for the discovery of alternative geometries changing throug time or presenting variable configurations according to the type of products considered.

9 groups of goods

The authors of the database CHELEM as made incredible efforts to maintain an homogeneous categorisation of goods in 72 types of producst over a period of 50 years. Considering the changes of the world economy and the evolution of the nomenclature used by trade organization, it is a genuine miracle to have done such a work. We adopt here a simplified version of the CHELEM typology in only 9 groups of products that reflect the distribution of value chains as well as the international division of labor (Grasland and Van Hamme (2010), Grataloup, Boucheron, and Fumey (2014))

    1. ENE : Energy
    1. MIN : Minerals, Intermediate goods
    1. AGR : Agriculture, Food
    1. TEX : Textile, Clothing
    1. ELE : Electronic
    1. EQU : Equipment, Machines
    1. TRA : Transport
    1. CHE : Chemical products
    1. MIS : Others
  • Commentaires : A statistical interest of this typology (out of the fact that it is relevant in terms of divsion of labor) is that the groups are relatively equlibrated in size. They offer interesting trends of variation of their respective shares that can increase (electronic, chemical), decrease (agriculture) or present chaotic evolution related to the variations of price (energy)

The hypercube with its 5940 cells

On the basis of previous rules we have built the expected hypercube with 5940 cells. The flows has been normalized to an arbitrary total sum of 1000000 for each period of ten years and the values has been round with zero decimal. We have introduced for each couple of region the flows in both direction \(F_{ijkt}\) and \(F_{jikt}\) in order to be able to compute easily the symetric part of exchange called volume and the asymetric part called balance :

  • Volume : \(V_{ijkt} = \frac{(F_{ijkt}+F_{jikt})}{2}\)
  • Balance : \(B_{ijkt} = F_{ijkt}-F_{jikt}\)
i j k t Fijkt Fjikt Vijkt Bijkt
G11:Europe G12:Medit.SE (1) ENE 1971-80 1239 16658 8948 -15419
G11:Europe G12:Medit.SE (1) ENE 1981-90 904 16290 8597 -15386
G11:Europe G12:Medit.SE (1) ENE 1991-00 652 8143 4398 -7490
G11:Europe G12:Medit.SE (1) ENE 2001-10 1371 9031 5201 -7660
G11:Europe G12:Medit.SE (1) ENE 2011-20 1942 5057 3499 -3115
G11:Europe G12:Medit.SE (2) MIN 1971-80 3849 1489 2669 2361
G11:Europe G12:Medit.SE (2) MIN 1981-90 3196 1187 2192 2009
G11:Europe G12:Medit.SE (2) MIN 1991-00 2234 875 1554 1359
G11:Europe G12:Medit.SE (2) MIN 2001-10 2188 1169 1679 1019
G11:Europe G12:Medit.SE (2) MIN 2011-20 1978 936 1457 1042

II. DOUBLE CONSTRAINT MODELS & REGIONALISATION

Before to adress the problem of research of an unknown partition, we will discuss the question of measuring the accuracy of an existing partition, which will help us to precise the problem of the choice of an optimisation criteria.

We will take as example the bilateral trade flows (\(V_ijkt\)) in order to have the same partition for origins and destination (the problem of asymmetry will be discussed later) and consider the total sum of flows in 1991-2000 as starting example. The existing partition will be the division in 4 regions (3 verticales + 1 residual).

Matrix of flows

G11 G12 G13 G21 G22 G23 G31 G32 G33 G41 G42 G43 Sum
G11 0 24373 11640 64562 5708 11391 50525 16925 4837 16958 11407 5747 224073
G12 24373 0 455 5332 146 838 3061 864 220 1375 1521 566 38751
G13 11640 455 0 4183 112 677 3995 893 197 175 561 853 23742
G21 64562 5332 4183 0 33752 13169 73913 18473 4013 1748 5233 2960 227336
G22 5708 146 112 33752 0 2459 4733 709 118 857 314 96 49003
G23 11391 838 677 13169 2459 0 5105 926 226 372 774 339 36275
G31 50525 3061 3995 73913 4733 5105 0 37009 8435 3648 13332 3444 207199
G32 16925 864 893 18473 709 926 37009 0 3282 755 3671 1853 85360
G33 4837 220 197 4013 118 226 8435 3282 0 46 558 382 22316
G41 16958 1375 175 1748 857 372 3648 755 46 0 448 501 26882
G42 11407 1521 561 5233 314 774 13332 3671 558 448 0 2252 40070
G43 5747 566 853 2960 96 339 3444 1853 382 501 2252 0 18994
Sum 224073 38751 23742 227336 49003 36275 207199 85360 22316 26882 40070 18994 1000000

MOD0 : Double constraint model

Assuming that flows are made of 1000000 of discrete events (the total sum of the matrix) we choose as reference (null model) a situation where the export \(O_i\) and import \(D_j\) of each spatial unit is known (margins of the matrix) and where the exchange are randomly distributed. Because of the absence of information on the diagonal of the matrix (trade internal to each region), the model can not be solved by a simple estimation but desserve an iterative double constraint model taking the from

\(F_{ij}^* = a_i.O_i.b_j.D_j+\epsilon_{ij}\)

Analysis of Deviance Table

Model: poisson, link: log

Response: Vij

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   131    2101056              
i    11   815299       120    1285757 < 2.2e-16 ***
j    11  1014053       109     271704 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square = 0.871"

This first model account for 87% of the initial deviance of the model which is important but logical considering the inequal size of the territorial units in terms of trade volume.

The analysis of standardized residual make possible to visualize the couple of units where exchanges are higher or lower than expected. A classification of this matrix of residuals make possible to reveal a structure in “blocks” of units that has more internal exchanges than expected.

We notice here that the classification of residuals fit relatively nicely with the expectations of the experts as we can recognize on the diagonal two first groups corresponding to the region Asia-Pacifica \((G_{31},G_{32}, G_{33})\) and the region Americas \((G_{21},G_{22}, G_{23})\). But the next region is limited to only two members of the Rest of the world \((G_{43},G_{43})\) because Russia \((G41)\) seems to be more associated with the region Europe-Mediterranea-Africa \((G_{11},G_{12}, G_{13})\).

MOD1 : Regional model with a single parameter

We can try to build a first regional model that assume the existence of a simple preference effect with the same value \(\gamma\) for units located inside the same region:

\(F_{ij}^* = a_i.O_i.b_j.D_j.\gamma^{REG}+\epsilon_{ij}\)

Despite the analysis made on the residuals, we decide to keep the partition in 4 regions forecast by the experts.

Analysis of Deviance Table

Model: poisson, link: log

Response: Vij

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   131    2101056              
i    11   815299       120    1285757 < 2.2e-16 ***
j    11  1014053       109     271704 < 2.2e-16 ***
REG   1   160356       108     111348 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Total) = 0.947"
Analysis of Deviance Table

Model 1: Vij ~ i + j
Model 2: Vij ~ i + j + REG
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       109     271704                          
2       108     111348  1   160356 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Gain) = 0.59"

We obtain a model with a pseudo R-square equal to 95 % of deviance explianed (including the effect of the null model) or 59 % of residual deviance of the reference model (excluding therefore what has been yet explained by double constraint on origins and estination). The coefficient \(\gamma\) is very significant and equal to 3.02 which means that exchanges between units located in the same region are in average 3 times greater than exchanges between units located in different regions.

MOD3 : Regional model with variable integration levels

We can adopt a different perspective and imagine that they are as many value of the parameter \(\gamma_{k}\) as they are possibilities of belonging to the same regions. Our model wil therefore take the form

\(F_{ij}^* = a_i.O_i.b_j.D_j.\gamma_{k}^{REG_{k}}+\epsilon_{ij}\)

Analysis of Deviance Table

Model: poisson, link: log

Response: Vij

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   131    2101056              
i    11   815299       120    1285757 < 2.2e-16 ***
j    11  1014053       109     271704 < 2.2e-16 ***
REG1  1    66272       108     205433 < 2.2e-16 ***
REG2  1    71893       107     133540 < 2.2e-16 ***
REG3  1    30040       106     103500 < 2.2e-16 ***
REG4  1      467       105     103034 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Total) = 0.951"
Analysis of Deviance Table

Model 1: Vij ~ i + j
Model 2: Vij ~ i + j + REG
Model 3: Vij ~ i + j + REG1 + REG2 + REG3 + REG4
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       109     271704                          
2       108     111348  1   160356 < 2.2e-16 ***
3       105     103034  3     8314 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Gain) = 0.621"

This model acount know for 95.1 % of the total deviance and 62.1% of the residual deviance of the reference model. It offers a significant improvement of the previous model and reveal that the levels of integration are different in each region. The most integrated regions are Europe_Mediterranea_Africa (\(\gamma_1=3.72\)) and Americas (\(\gamma_2=3.84\)),followed by Asia-Pacifica (\(\gamma_3=2.45\)) and finally the rest of the world (\(\gamma_4=1.36\))

MOD3 : Moving Russia toward Europe-Mediterranean-African region

In the previous analysis we have followed the expert advice concerning the division of the world in 4 regions. But we can ask if these choice was really optimal. Looking at the residual of the reference model, we can imagine another partition of the world in four groups where Russia is associated to the region Europe-Mediterranea-Asia. What would be the result ?

Analysis of Deviance Table

Model: poisson, link: log

Response: Vij

Terms added sequentially (first to last)

     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                   131    2101056              
i    11   815299       120    1285757 < 2.2e-16 ***
j    11  1014053       109     271704 < 2.2e-16 ***
REG1  1   113899       108     157805 < 2.2e-16 ***
REG2  1    64655       107      93150 < 2.2e-16 ***
REG3  1    24385       106      68765 < 2.2e-16 ***
REG4  1     3706       105      65059 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Total) = 0.969"
Analysis of Deviance Table

Model 1: Vij ~ i + j
Model 2: Vij ~ i + j + REG
Model 3: Vij ~ i + j + REG1 + REG2 + REG3 + REG4
Model 4: Vij ~ i + j + REG1 + REG2 + REG3 + REG4
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       109     271704                          
2       108     111348  1   160356 < 2.2e-16 ***
3       105     103034  3     8314 < 2.2e-16 ***
4       105      65059  0    37975              
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Gain) = 0.761"

This model acount now for 96.9 % of the total deviance and 76.1% of the residual deviance of the reference model. It offers a significant improvement of the previous model and modify the levels of integration each region. The integration of the Europe_Mediterranea_Africa extende to Russia is increased (\(\gamma_1=4.74\)) but a small decrease is observed in Americas (\(\gamma_2=3.54\)), in Asia-Pacifica (\(\gamma_3=2.24\)) but we observe a strong decrease of integration in the remaining part of the rest of the world (\(\gamma_4=3.1\)).

MOD2(t) : The time dimension

We decide know to replicate the model 2 for each of thetime period in order to examine the variations of regional integration.

Analysis of Deviance Table

Model: poisson, link: log

Response: Vijkt

Terms added sequentially (first to last)

       Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                    5939   12089700              
i:t    59  3362205      5880    8727495 < 2.2e-16 ***
t:j    55  4213874      5825    4513621 < 2.2e-16 ***
t:REG1  5   273136      5820    4240485 < 2.2e-16 ***
t:REG2  5   323063      5815    3917422 < 2.2e-16 ***
t:REG3  5   198226      5810    3719196 < 2.2e-16 ***
t:REG4  5    10156      5805    3709039 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Parameters (gamma) of regional integration by time period
reg 1971-80 1981-90 1991-00 2001-10 2011-20
Eur-Med-Afr REG1 2.87 3.76 3.72 2.35 2.25
Americas REG2 3.03 2.88 3.84 4.50 4.49
Asia-Pacifica REG3 4.29 3.11 2.45 2.76 2.82
Rest of the World REG4 0.44 0.71 1.35 1.26 1.46

_ Commentaire : The introduction of time reveals variations of regional integration through time. For example, the region Eur-Med-Afr has a maximum integration in 1981-1990 and 1990-2000 but lower level before and after. The region Americas, on the contrary has a maximum integration in the final periods of 2001-2010 and 2011-2020. The region Asia-Pacifica was very integrated in 1971-80 and experiment a decrease until 1991-2000 before to increase slowly again.

MOD2(k) : The product dimension

Here, we replicate the model 2 for the final time period 2011-2020 but we examine separately the level of integration by products.

Analysis of Deviance Table

Model: poisson, link: log

Response: Vijkt

Terms added sequentially (first to last)

        Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
NULL                     1187    2155633              
i:k    107   828616      1080    1327018 < 2.2e-16 ***
k:j     99   909782       981     417236 < 2.2e-16 ***
k:REG1   9    38441       972     378794 < 2.2e-16 ***
k:REG2   9   103555       963     275239 < 2.2e-16 ***
k:REG3   9    48475       954     226764 < 2.2e-16 ***
k:REG4   9     7590       945     219174 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Parameters (gamma) of regional integration by products in 2011-2020
reg AGR CHE ELE ENE EQU MIN MIS TEX TRA
Eur-Med-Afr REG1 2.34 1.64 5.07 3.97 2.33 2.12 0.60 3.10 2.98
Americas REG2 2.18 3.28 6.50 22.40 4.38 2.14 12.16 4.53 4.36
Asia-Pacifica REG3 2.62 4.13 1.12 4.73 2.81 7.16 7.45 1.41 2.16
Rest of the World REG4 1.91 1.47 1.31 0.40 0.81 1.08 3.33 1.11 1.06
  • Commentaire : The table reveals very important differences in the degree of regional integration of trade within the same region when we consider different products. For example the region America is very strongly integrated for energy (\(\gamma = 22.4\)) because of its relative autonomy for the provision of gas, oil or coal. It is less integrated for Agriculture and food (\(\gamma = 2.2\)) because of high level of exports and imports with the rest of the world.

Discussion : what is the best regionalization ?

The sequence of models indicates that a simple validation of an existing partition does not guarantee that we have found the optimal solution. In our example, we should certainly explore all the possible partition before to validate our final model as the best partition of world trade in 4 regions.

We have also to consider that the decision fo choose 4 regions is not necessarily optimal and we could imagine that more interestin results could be achieved with a partition in 2, 3 or 5 regions. But in this case we have to propose a criterium of optimisation like AIC or BIC which take into account the number of classes used. Finally our results sggest:

  1. the optimal partition in 1991-2000 is not necessarily the best at another time period.
  2. The optimal partition for one type of goods is not necessarily the same for another type of goods.

In other words, the question of optimal regionalisation is very complex but also very exciting …

ANNEX : BIBLIOGRAPHY

Grasland, Claude, and Gilles Van Hamme. 2010. “La Relocalisation Des Activités Industrielles: Une Approche Centre-Périphérie Des Dynamiques Mondiale Et Européenne.” Espace Géographique 39 (1): 001019. https://www.cairn.info/revue-espace-geographique-2010-1-page-1.htm.
Grataloup, Christian, Patrick Boucheron, and Gilles Fumey. 2014. Atlas Global. Les Arènes. https://hal.science/hal-03315891/.